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Abstract. The space requirements of an m-ary search tree satisfies a well-known phase 
transition: when m < 26, the second order asymptotics is Gaussian. When m > 27, it 
is not Gaussian any longer and a limit W of a complex-valued martingale arises. We 
show that the distribution of W has a square integrable density on the complex plane, 
that its support is the whole complex plane, and that it has finite exponential moments. 
The proofs are based on the study of the distributional equation W = J2k=i ^k^ki 
where V\, ■■■,V m are the spacings of (m — 1) independent random variables uniformly 
distributed on [0,1], Wi, W m are independent copies of W which are also independent 
of (Vi, V m ) and A is a complex number. 
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(N : 1 Introduction 

Search trees are fundamental data structures in computer science used in search- 
ing and sorting. For integers m > 2, m-ary search trees generalize the binary 
^ | search tree. The quantity m is called the branching factor. 

A random m-ary search tree is an m-ary tree in which each node has the capacity 
to contain (m— 1) elements called the data or the keys. The keys can be considered 
as i.i.d. random variables Xj, % > 1, with any diffusive distribution on the interval 
[0,1]. 

The tree T n ,n > 0, is recursively denned as follows: T is reduced to an empty 
node-root; T\ is reduced to a node-root which contains x\, T 2 is reduced to a 



1 Universite de Versailles-St-Quentin, Laboratoire de Mathematiques de Versailles, CNRS, 
UMR 8100, 45, avenue des Etats-Unis, 78035 Versailles CEDEX, France. 

2 LMAM, Universite de Bretagne Sud, Campus de Tohannic, BP 573, 56017 Vannes, France. 

3 Universite de Versailles-St-Quentin, Laboratoire de Mathematiques de Versailles, CNRS, 
UMR 8100, 45, avenue des Etats-Unis, 78035 Versailles CEDEX, France. 



1 



node-root which contains X\ and X2, • • • , T m _i still has one node-root, containing 
Xi, . . .x m _i. As soon as the (m — l)-th key is inserted in the root, m empty 
subtrees of the root are created, corresponding from left to right to the m ordered 
intervals ii =]0, xm[, . . . ,I m =]x( m -i), 1[, where < xm < ■ ■ ■ < X( m -i) < 1 are 
the ordered first (m — 1) keys. Each following key x m , ... is recursively inserted 
in the subtree corresponding to the unique interval Ij to which it belongs. As 
soon as a node is saturated, m empty subtrees of this node are created. 
For each i — {1, . . . ,m — 1} and n > 1, Xn is the number of nodes in T n which 
contain (i — 1) keys (and i gaps or free places) after insertion of the n-th key; such 
nodes are named nodes of type i. We only take into consideration the external 
nodes and not the internal nodes which are the saturated nodes. The vector X n 
is called the composition vector of the m-ary search tree. It provides a model 
for the space requirement of the algorithm. By spreading the input data in m 
directions instead of only 2, as is the case for a binary search tree, one seeks to 
have shorter path lengths and thus quicker searches. One can refer to Mahmoud's 
book [9] for further details on search trees. 

The following figure is an example of 4-ary search tree obtained by insertion 
of the successive numbers 0.3 ,0.1, 0.4, 0.15, 0.9, 0.2, 0.6, 0.5, 0.35, 0.8, 0.97, 
0.93, 0.23, 0.84, 0.62, 0.64, 0.33, 0.83. The corresponding composition vector is 
X 18 = '(9,2,2). 




A numerous literature is devoted to the asymptotic behavior of this composition 
vector. A famous phase transition appears. When m < 26, the random vector 
admits a central limit theorem with convergence in distribution to a Gaussian 
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vector: see also Mahmoud's book [5] or Janson [BJ for a vectorial treatment. 

When m > 27, an almost sure asymptotics for the composition vector has been 
obtained in [2]: 

X n = nvi + $t(n X2 Wv 2 ) + o(n a2 ) a.s., (1) 
where A2 = 02 + ^2 is the root of the polynomial 

m— 1 

Y[{z + k)-m\ (2) 

k=l 

having the second largest real part a 2 and a positive imaginary part T2, v\ and V2 
are two deterministic vectors, and W is the limit of a complex-valued martingale 
that admits moments of all positive orders. 

Heated conjectures about the second order complex-valued limit distribution W 
remain open (see [2], [TU], Chern and Hwang [3J, Mahmoud [9], Janson [B]). 

A significant step is achieved by Fill and Kapur in |3], who establish that W 
satisfies the following distributional equation called the smoothing equation: 

m 

w k Y,v k x *w k , (3) 

fc=l 

where V\, V m are the spacings of (m — 1) independent random variables uni- 
formly distributed on [0,1], W±, ...,W m are independent copies of W which are 
also independent of (Vi, V m ). The precise definition of Vj will be given hereun- 
der. By a contraction method, Fill and Kapur prove that W is the unique solution 
of Eq. ([3]) in the space A^C) of square integrable probability measures having 
C = K(W) as expectation. The present paper is based on this characterization 
of W. 

It has been recently proved [TJ that the continuous-time embedding of the process 
(X n ) n has an analogous asymptotic behavior, with a second-order term which is 
a solution of some distributional equation (not the same one). Inspired by this 
study of the continuous-time case we prove the following theorem. 

Theorem 1 Let W be the second order limit distribution of an m-ary search tree 
for m > 27, defined by (T7J). 

(i) The support ofW is the whole complex plane. 

(ii) The law of W admits a continuous square integrable density on C. 

(Hi) Ee 5 ' 1 ^' < 00 for some 5 > 0. The exponential moment generating series 
ofW (thus) has a positive radius of convergence. 



Thanks to Fill and Kapur results [3], these results are immediate corollaries of 
Theorems [3] and [Cj] proved in the next two sections. 

In the whole sequel, let V%, . . . , V m be the spacings of (m — 1) independent random 
variables uniformly distributed on [0,1]. In other words, let U\,...,U m -\ be 
independent random variables uniformly distributed on [0, 1] and let < • • • < 
U(m-i) be their order statistics. Denote also Lfy) : = 0, U( m ) := 1. For any 
fc 6 {1, . . . , m}, the random variable is defined by 

Vfc := C7(fc) — U(k-i)- 

The variables are Beta(l,m — l)-distributed and satisfy YlT=i ^fc = 1 almost 
surely. 

Remark 2 Details about roots of (TJ|) can 6e found in Hennequin J5]/ anc? Ma/i- 
moud /2J/. TVote t/jat /or m = 2, the polynomial (TJ|) nas i/ie unique root A = 1. 
Form > 3, it is known that if X2 is a root of the polynomial (TJ|) having the second 
largest real part, then A2 is non real, KA2 < 1 for any m > 3, KA2 > if and 
only if m > 14, and 

3?A 2 > \ m > 27. 



2 Support 

The limit distribution W satisfies ()3]). From now on, we consider the solutions of 
the distributional equation 

m 

Z k YviZ* (4) 

k=l 

where V\, V m are the spacings of (m — 1) independent random variables uni- 
formly distributed on [0, 1], Z\, Z m are independent copies of Z which are also 
independent of (Vi, V m ) and A is a non real complex number. 

We assume that 

X is a non real root of (0|) having a positive real part a 

Indeed, Vi, . . . , V m are Beta(l,m — l)-distributed and 3ft (A) > guarantees that 
E I I < 00. Moreover, we are interested in solutions of @ having a nonzero 
expectation and the existence of such solutions implies that A is a root of fl2]). 
Note that when m < 13, no A satisfies (JSJ). 

The following theorem implies Theorem []Ji) because W is integrable with ~EW = 
pTjrr ) 7^ (see [10] for instance). 



(5) 



Theorem 3 Let X be a non real complex number having a positive real part. If 
Z is a solution of having a nonzero expectation, then the support of Z is the 
whole complex plane. 

The proofs of Theorem [3] and Theorem make use of the complex- valued random 
variable 



(6) 



k=l 



Notice that the existence of an integrable solution Z of (j3J) such that E(Z) ^ 
implies that 

E(A) = 1, (7) 
which just means that A is a root of the polynomial ([2]). 

Proof of Theorem (3J For a complex valued random variable X, we denote 
its support by 

Supp(X) = {igC,V £ > 0, P(|X - x\ < e) > 0}. 

having a nonzero expectation. We first prove that 



Let Z be a solution of 

Va G C, V2 G C, 



a G Supp(A) and z G Supp(Z) =>■ az G Supp(Z). 



(8) 



Indeed, let e > 0, a G Supp(A) and z G Supp(Z). Let also Zi, . . . , Z m be i.i.d. 
copies of Z. Then, with positive probability, \A — a\ < e and \Zk — z\ < e for any 
k. Therefore, with positive probability, 



J2 y t Zk 



az 



k=l 



V k X (Z k -z) + z(A 



k=l 



< [m + e)e + \z\e. 



The positive e being arbitrary, this shows that az G Supp (V^Z\ + ■ ■ 
which implies that az G Supp(Z) because of (j3J). 

Let z G Supp(Z) \ {0}. Such a z exists because E(Z) ^ 0. Iterating (jSJ), any 
complex number of the form . . . a n z where a 1 , . . . ,a n G Supp (A) belongs to 
Supp(Z). Therefore, Lemmas H] and below imply that Supp(Z) contains C\{0} 
which suffices to conclude since the support of a probability measure is a closed 
set. □ 

Lemma 4 There exist c, d G C \ {0} and respective open neighbourhoods V and 
V of c and d such that \c\ > 1, \c \ < 1 and V U V C Supp(A). 
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Proof. Obviously, 

Supp(yl) 



J2 f k> o < t fc < i, ^2t k = i 



k=l 



In particular, Supp(A) contains the set / ([0, l] 2 ) (the image of [0, l] 2 by /), where 
/ is defined by 

/: [0,1] 2 -> C 

(s,t) m- ( st )* + ( s (l-t)) X + (l-s) x . 

We show that there exist (s c , t c ) and (s c /, t c i) in ]0, 1[ 2 such that c := f(s c , t c ) and 
d := f(s C ',t C ') satisfy |c| > 1, < |c'| < 1 and / is a local diffeomorphism in 
some respective neighbourhoods of (s c ,t c ) and (s c ',t c >), which implies the result. 
Let u and r be respectively the real part and the imaginary part of A. By 
assumption, < a < 1. We assume that r > 0; if not, replace Z and A by their 
conjugates. For any integer k > 1, denote 



u k = exp 



2kii\ . ( ix — 2kn 

and u k = exp 



T 



Then, Uf. and u' k are reals in ]0, 1[ that tend to as k tends to infinity, and they 
satisfy 

u x k = ul G]0, 1[ and u' k X = -u' k E] - 1,0[. 

Denote moreover 



Sk ■= Uk + u k , 



tk 



1 



'fc •- 



Ul. + U 



I 2 

k ) 



1 + U k 



k 



Mi 



As < o" < 1, we have 



and 



\f(Sk,tk)\ 



1/(4,4)1 



2\A 



+ u fc 2A + (1 - u k - u k 2 ) 



Ui 



o K) 



/ A 

u k 



U,. 



2X 



U, 



U, 



u^ + o (4) 



when k tends to infinity, so that \f(s k ,t k )\ > 1 and < \f(s' k ,t' k )\ < 1 when k is 
large enough. It remains to show that / is a local diffeomorphism in neighbour- 
hoods of (s k ,t k ) and (s' k ,t' k ). Let $ : [0, l] 2 -> 1R 2 be defined as 



It suffices to show that the Jacobian of $ at suitable (s c , t c ) and (s c r,t c r) does not 
vanish to show that / is a local diffeomorphism at these points. This sufficient 
condition is equivalent to requiring that 

d£ x W 

ds dt 

is non real (the overline denotes the complex conjugacy). 
For any k > 1, after computation, one gets 



9f , ^ df , 
-^{s k ,t k ) x — {s k ,t k 



) = |A| 



I A 



; 1 + Uk 
SkU k 

2 1 + u k 

SkU k 



u 



\ (1 + ul) - s k {l - s k ) 



A-1 



U, 



,2<t 



Sfcfl - s fc 



iA-1 



ft 



.cr+1 



A+l 



2 A 



.2(7 



The above number is real if and only if (1 — s k ) x G R, ie. if and only if rlog(l — 
sjfc) G 7rZ. Since s k ^ and tends to zero when k tends to infinity, r log(l— s k ) 
7rZ as soon as k is large enough. Therefore, taking (s c ,t c ) = (s k ,t k ) for k large 
enough suffices to get the result. An argument of the same kind applied to the 
sequence (s' k ,t' k ) k leads to the result on the existence of (s c i,t c i). □ 

Lemma 5 Let V and V be respectively open neighbourhoods of c G C and d G C 
with \c\ > 1 and < |c'| < 1, which do not contain 0. Let 

M := {wiw 2 . . . v n , n > 1, fx, t> 2 , • • • , v n G V U V'} . 

Then M = C\ {0}. 

Proof. Let £ and £' be complex numbers such that 5R£ > and < 0. Let 
U and U' be respectively open neighbourhoods of I and Denote by M. the 
additive submonoid of C/2iirZ generated by U U U' mod 2i7r; it is the set of 
classes 

M. := {ui + u 2 + • • • + u n mod 2iii, n > 1, Ui, u 2 , ■ ■ ■ , u n G U U U'}. 

We prove hereunder that M. = C/2inZ. Taking the exponential, this suffices to 
prove the lemma. 
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Take an integer p > 1 large enough so that pU contains a whole mesh of the lattice 
generated by I and £', i.e. such that pU D p£+ [0, 1}£+ [0, Then, A4 contains 
the classes mod 2iir of the sector S := pi + R> ^ + ^> C Since x W < 0, 
when z is any complex number, there exists g6Z such that z — 2mq £ 5, which 
proves the result. □ 



3 Density and exponential moments 

As in the beginning of Section [2j Theorem [ljii) and (iii) are straightforward 
corollaries of the following theorem. 

Theorem 6 Let X be a non real complex number and Z a solution of ^ having 
a nonzero expectation. 

(i) If^t(X) > 0, then Z admits a continuous square integrable density on C. 

(ii) If 9ft(A) > |, then Ee" 5 ' 2 ' < oo for some 5 > 0. The exponential moment 
generating series of Z (thus) has a positive radius of convergence. 

Proof. It runs along the same lines as in [1] and uses the Fourier transform ip 
of Z, namely 

tp(t) := Eexp{i(t,Z}} = Eexp{m(tZ)}, te C, 

where (x,y) = $l(xy) = 3f?(x)3?(?/) + Q(x)Q(y). In terms of Fourier transforms, 
Eq. (|4]) reads 
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• To get (i), we prove that <p is in L 2 (C) because it is dominated by \t\~ s for 
some 5 > 1 so that the inverse Fourier-Plancherel transform provides a square 
integrable density for Z. The guiding idea consists in adapting methods (devel- 
oped in [7] and j8]) usually applied to positive real- valued random variables to 
the present complex- valued case. For any r > 0, denote 



Using Theorem [31 one can step by step mimick the proof of Theorem 7.17 in [T] 
to get the result. We just give hereunder an overview of this proof, written as 
successive hints. 

Show first that Theorem [3] implies that tp(r) < 1 for any r > 0. Then, notice that 



By Fatou's lemma, (fTUI) implies that limsup +00 ip{r) G {0,1}. Iterating suitably 
inequality (TTU|) leads to lim +00 ip{r) = 0. Finally, applying ( TTUj) again we can show 
that ip{r) = 0{r~ 5 ) for some 5 > 1, so that <p is square integrable on C, which 
leads to the result. 

• To get (ii), like in [1], we use Mandelbrot's cascades. Denote V := (Vi, V2, ■ ■ ■ , V m ) 
Let U be the set of finite sequences of positive integers between 1 and m, namely 



Elements of U are denoted by concatenation. Let V u := (Ka, V U 2, ■ ■ ■ , V urn ), u G 
U be independent copies of V, indexed by all finite sequences of integers u = 
U\ . . . u n G U . 

Introduce the martingale (Y n ) n >i defined by 



-0(r) := max |<£>(t)|. 

\t\=r 




(10) 



U:= UiM,..., 



m} n . 



Ev x v x ...V 
'til ' U1U2 ' ll 



ui...u n • 



ui...M n e{l,...,m} 



By ([7D, E(y n ) = E(A) = 1. It can be easily seen that 



in 




(11) 



fc=i 



9 



where Y n> k for 1 < k < m are independent of each other and independent of the 
Vfc and each has the same distribution as Y n . Besides, since a > |, mEV^ a < 1 
and, by Cauchy-Schwarz inequality, 

(m \ 2 / m \ 2 m 

\V h X \ J = E I Y V k I < 2E ^ V fc 2,T = 2mElf CT < 2. 
fc=l / \fc=l / k=l 

Therefore for n > 1, Y n is square integrable and 

Var F n+1 = (E| A\ 2 - 1) + mE^ Var F n , 

where VarX = E(|X — EX| 2 ) denotes the variance of X. Thus, the martingale 
(Y n ) n is bounded in L 2 , so that when n — > +oo, 

Y n — > Yoo a.s. and in L 2 , 

where is a (complex-valued) random variable with variance 

EIAI 2 - 1 



Var(F 0< 



2a ' 



1 - mEVi 

Passing to the limit in Eq. (jTTl) shows that is a solution of Eq. (j4j) and by 
unicity, (ii) in Theorem [6] holds as soon as it holds for Y^. 

This last fact comes from an adaptation of Lemma 8.29 in [I], giving some con- 
stants C > and e > such that for all t £ C with |t| < e, we have 

Ee {t ' Yoo) < e^W+ci*! 2 . 

The adaptation relies on YlT=i ^k, a < 1 a - s - fo r ° > \- The last assertion 
implies that Ee*' r °°' < oo for t > small enough, so that the exponential moment 
generating series of Y^ has a positive radius of convergence. □ 
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